3,558 research outputs found

    Ms2lda.org: web-based topic modelling for substructure discovery in mass spectrometry

    Get PDF
    Motivation: We recently published MS2LDA, a method for the decomposition of sets of molecular fragment data derived from large metabolomics experiments. To make the method more widely available to the community, here we present ms2lda.org, a web application that allows users to upload their data, run MS2LDA analyses and explore the results through interactive visualisations. Results: Ms2lda.org takes tandem mass spectrometry data in many standard formats and allows the user to infer the sets of fragment and neutral loss features that co-occur together (Mass2Motifs). As an alternative workflow, the user can also decompose a dataset onto predefined Mass2Motifs. This is accomplished through the web interface or programmatically from our web service

    Health-Related Fitness Knowledge Growth in Middle School Years: Individual- and School-Level Correlates

    Get PDF
    Background: Health-related fitness knowledge (HRFK) has been an essential concept for many health and physical education programs. There has been limited understanding and longitudinal investigation on HRFK growth. This longitudinal study examined HRFK growth and its individual- and school-level correlates in middle school years under 1 curriculum condition: Five for Life. Methods: Participants were 12,044 students from 47 middle schools. Data were collected at both individual/participant and school/institution levels. Individual-level variables included gender, grade, and HRFK test scores. School-level variables included percentage of students receiving free and reduced meals (FARM), student-to-faculty ratio for physical education, and school academic performance (SAP). We used hierarchical linear modeling to examine HRFK 3-year growth in relation to individual- and school-level correlates. Results: The average HRFK score at 6th grade for females was 42.81% ± 1.32%. The predicted HRFK growth was 17.06% ± 1.02% per year, holding other factors constant. A 1-standard deviation increase in FARM correlated with a 14.68%-point decrease in predicted test score (p = 0.02). A 1-standard deviation increase in SAP was associated with an 11.90%-point increase in HRFK score. Males had a significantly lower growth rate than females during the middle school years (0.78%/year, p = 0.02). Conclusion: The result showed that both individual- and school-level variables such as gender, FARM, and SAP influenced HRFK growth. Educators should heed gender differences in growth curves and recognize the correlates of school-level variables

    Improved Approximate Degree Bounds for k-Distinctness

    Get PDF
    An open problem that is widely regarded as one of the most important in quantum query complexity is to resolve the quantum query complexity of the k-distinctness function on inputs of size N. While the case of k=2 (also called Element Distinctness) is well-understood, there is a polynomial gap between the known upper and lower bounds for all constants k>2. Specifically, the best known upper bound is O (N^{(3/4)-1/(2^{k+2}-4)}) (Belovs, FOCS 2012), while the best known lower bound for k? 2 is ??(N^{2/3} + N^{(3/4)-1/(2k)}) (Aaronson and Shi, J. ACM 2004; Bun, Kothari, and Thaler, STOC 2018). For any constant k ? 4, we improve the lower bound to ??(N^{(3/4)-1/(4k)}). This yields, for example, the first proof that 4-distinctness is strictly harder than Element Distinctness. Our lower bound applies more generally to approximate degree. As a secondary result, we give a simple construction of an approximating polynomial of degree O?(N^{3/4}) that applies whenever k ? polylog(N)

    Filler Word Detection and Classification: A Dataset and Benchmark

    Full text link
    Filler words such as `uh' or `um' are sounds or words people use to signal they are pausing to think. Finding and removing filler words from recordings is a common and tedious task in media editing. Automatically detecting and classifying filler words could greatly aid in this task, but few studies have been published on this problem. A key reason is the absence of a dataset with annotated filler words for training and evaluation. In this work, we present a novel speech dataset, PodcastFillers, with 35K annotated filler words and 50K annotations of other sounds that commonly occur in podcasts such as breaths, laughter, and word repetitions. We propose a pipeline that leverages VAD and ASR to detect filler candidates and a classifier to distinguish between filler word types. We evaluate our proposed pipeline on PodcastFillers, compare to several baselines, and present a detailed ablation study. In particular, we evaluate the importance of using ASR and how it compares to a transcription-free approach resembling keyword spotting. We show that our pipeline obtains state-of-the-art results, and that leveraging ASR strongly outperforms a keyword spotting approach. We make PodcastFillers publicly available, and hope our work serves as a benchmark for future research.Comment: Submitted to Insterspeech 202

    Analysis of Archived Residual Newborn Screening Blood Spots After Whole Genome Amplification

    Get PDF
    Deidentified newborn screening bloodspot samples (NBS) represent a valuable potential resource for genomic research if impediments to whole exome sequencing of NBS deoxyribonucleic acid (DNA), including the small amount of genomic DNA in NBS material, can be overcome. For instance, genomic analysis of NBS could be used to define allele frequencies of disease-associated variants in local populations, or to conduct prospective or retrospective studies relating genomic variation to disease emergence in pediatric populations over time. In this study, we compared the recovery of variant calls from exome sequences of amplified NBS genomic DNA to variant calls from exome sequencing of non-amplified NBS DNA from the same individuals. Results: Using a standard alignment-based Genome Analysis Toolkit (GATK), we find 62,000-76,000 additional variants in amplified samples. After application of a unique kmer enumeration and variant detection method (RUFUS), only 38,000-47,000 additional variants are observed in amplified gDNA. This result suggests that roughly half of the amplification-introduced variants identified using GATK may be the result of mapping errors and read misalignment. Conclusions: Our results show that it is possible to obtain informative, high-quality data from exome analysis of whole genome amplified NBS with the important caveat that different data generation and analysis methods can affect variant detection accuracy, and the concordance of variant calls in whole-genome amplified and non-amplified exomes.National Institute of Health P01HD067244, NS076465, R01ES021006Nutritional Science
    • …
    corecore